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Brap ey Strawser: There aren't many scholars and 

philosophers who do work on military-related ethics, and 
y to have Dr. Jeremy Dav 
After his lecture, we'll have an open discussion, and we'll 


so we are lu peak to us today 


be us atham House rules, so there will be no attribu- 


tion of any of the comments or questions. 


Jeremy Davis: Today, I'd like to address a big question 
that I think philosophers, ethicists, and maybe even folks 


in the military spend too little time thinking about, and 
that is the role of algorithms in justifying killing. I’m pa 
ticularly interested in hearing what service members have 
to say about this topic; I’m not a service member, and I 
know that you have a keen eye and a distinctive perspective 
on this topic, so I’m serious about hearing your thoughts. 
I’m going to set up a strict account of th 


wrong, you tell me why. 


Project Maven and Artificial 
Intelligence in Wa 


Project Maven, also known as the US Department of 
Defense Algorithmic Warfare Cross-Functional ‘Team, was 
instituted in April 2017. Its aim was “to accelerate DoD’s 

integration of big data and machine learning” in order 
to “turn the enormous volume of data available to DoD 


into actionable intelligence and insights at speed.”! More 


specifically, the initial phase of the project was to create, 
train, and ultimately deploy machine-learning technology 
to sift through the countless hours of data that have been 
collected through the military's various surveillance 
methods. I’m going to talk today about how that’s gone 
awry in some cases. 


Project Maven was instituted in April 
2017 “to accelerate DoD’s integration 
of big data and machine learning.” 


One of the more obvious reasons for something like 
Project Maven is to respond to a number of inefficiencies 
that present themselves when one is dealing with mas- 
sive swaths of data. The first problem is that there’s just 

far more data being collected than humans can process. 
According to one calculation, the Air Force alone had 
acquired so much surveillance footage in Iraq over the 
course of a single year that it would take one person 24 
years of working nonstop to get through it all. According 
to another estimate, 700,000 hours—that's 80 years—of 
footage was collected in 2017 alone, so obviously, there's 
far more data than even teams of observers could realisti- 
cally deal with. Second, a lot of the data that are generated 
are completely useless, so spending countless analyst-hours 
‘on that is a massive waste of resources. Finally, it’s really 
difficult to draw connections across data and be able 

to identify the source of problems or potential leads to 
follow up on if you have multiple massive teams of people 
working, particularly teams that aren't necessarily speaking 
with each other or even focused on the same set of issues. 


A lot of these inefficiencies or challenges can be resolved by 
shifting to big-data methods—machine learning—where 
you can create algorithms, create systems, in which a lot 

of these inefficiencies can be completely eliminated. The 
systems can identify connections; they can sift through 
countless hours much better than enormous teams of 
potentially sleepy or bored people watching cameras all 
day. Machines solve that problem. Of course, the ultimate 
goal is not only to solve that problem, but also to create a 
lot of strategic advantages as well. 


Although a lot of information about Project Maven is clas- 
sified, the topic of artificial intelligence (AI) in warfighting 
has come to the public’s attention in the past few years, and 
there has been some discussion of it in public venues. A ci- 
vilian might have first learned about Maven when Google 
decided that it wasn’t going to renew its contract with 

the Pentagon. Many of its AI researchers said they didn’t 
want to have anything to do with the project. A number of 


other companies then stepped in to fill the void, including 
Anduril, Clarifai, and Palantir. The last of these is probably 
the most widely known to you, because Palantir is one of 
the developers of predictive policing technologies. 


Project Maven is one of the many different programs 
within DoD to advance AI. We're not really talking here 
about the killer robots phenomenon, but Ill say more 
about that in a minute. We're talking specifically about the 
use of algorithms in developing technologies that humans 
ultimately will be responsible for using. There’s also a 
program called Atlas, which stands for Advanced Targeting 
and Lethality Automated System. I’ve got a quote from an 
Army engineer about how the algorithm isn’t making the 
judgment—and here we need to draw a contrast with killer 
robots. We're not just setting this thing free to do whatever 
it wants; the algorithm is providing information to aid 

the service member in deciding what to do. There's also 
been some preliminary research into outfitting tanks with 
Al weapons, which is going to involve machine-learning 
algorithms that will help not only to identify people who 
are known to us as targets, but also to make predictions 
about where people will go, what they are doing, what 
they're going to do, whether or not the action that we see 
them doing constitutes a threat, and that sort of thing. 


For the sake of definition, we're going to call these “algo- 
rithmic systems.” We are talking about something that 
relies on big data, algorithmic processing, and/or machine 
learning: typically, all three. It is both historical and predic- 
tive; it both takes facts that we already have about a person, 
such as biometric data or whatever, and also makes predic- 
tions about where they're going to be and what they're 
going to do, and so aids us in making decisions. It can be 
used particularly in the military context, although it does 
have a broader range of applications. 


We are talking about something 
that relies on big data, algorithmic 
processing, and/or machine learning: 
typically, all three. 


As I said earlier, I think philosophers and ethicists have spent 
far too little time thinking about this. They're fixated instead 
on the phenomenon of extreme killer robots that we'll set 
free to do all of our bidding. We're much further away from 
those than we are from these data systems, which are being 
developed and exist now. We need to ask actual questions 
that apply to these systems, particularly because, as with 
most technologies that are already being used, we didn’t ask 
these questions before they were put into use, right? 
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The Man in the Purple Hat 


I want to tell you a story. This 
the Purple Har. It’s from the book First Platoon by Annie 
Jacobsen, who's a celebrated journalist.’ The story takes 
place in Afghanistan and centers on the Persistent Ground 
Surveillance System, or PGSS. It happened in 2012, several 
years before Project Maven wa 
was kind of a precursor, and it alerts us to the problems 
that are latent in these systems. PGSS is basically an 
enormous blimp, and one of the things it does 
24/7 surveillance footage of all the areas relevant to its 


is the story of the Man in 


instituted, so the system 


s to take 


instructio: 


nd then organizes, processes, and tags the 

footage using software that was developed by Palantir. The 
technology helps the military track particular individuals. 
Ie’s not really predictiv 


it’s just tracking. It’s meant to 
understand individual habits, to identify them and to say, 
look, here’s where they tend to go. As it acquires this data 
about people, it will be able to make predictions, but it’s 
primarily used as an identification tool. 


‘A man named Kevin is the PGSS operator at the time the 
story takes place, and he’s monitoring a man in a purple 
hat. From many hours of surveillance, Kevin comes to 
believe that the Man in the Purple Hat plants IEDs in an 
effort to harm civilians or soldiers. Once Kevin compiles 
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enough evidence, he takes it and runs it through Palantir 
software, which enables him to keep a closer eye on the 
Man in the Purple Hat, and even to predict where he’s 
going to be. Kevin's unit designates the man with a “429 
package,” meaning that when the time is right, they're 
going to take him out. They think this is a bad guy, and 

it’s his time. One day, Kevin's colleagues inform him that 
they have the Man in the Purple Hat in their sights, and 
they think it’s time to call in air support and take him 

out. Kevin's colleague points the man out on the screen, 

S ting on a tractor. After a few 
minutes, with air support on the way, Kevin starts having 
second thoughts. He later told Annie Jacobsen, “I thought, 
wow, that looks like him, but something just gave me a 
tickle that it wasn’t him. Number 1, he’s not a worker. He’s 


a bad guy, and bad guys don’t tool around on tractors and 
play farmer. I said, ‘I’m certain it’s not him.” His super- 
visor says, “Well, you've got five minutes to figure that out 
and prove me wrong.” So Kevin runs over to the Tactical 
Operations Center—he literally runs over there—to prove 
that it is the wrong man. He has the operator inspect the 
home of the Man in the Purple Hat, and sure enough, the 
man they are looking for is actually at his home. 


‘This was a case of mistaken identity. Fortunately, they 
called off the strike. According to Kevin, had a computer 


done the algorithm on the guy on the tractor, as far as the 
computer was concerned, that was him: the Man in the 
Purple Hat. But because Kevin had been watching him 
for months, he knew that it wasn’t him. “I knew his face. I 
doubted the computer. I was right.” 


As far as the computer was concerned, 
that was him: the Man in the Purple Hat. 
But because Kevin had been watching him 
for months, he knew that it wasn’t him. 


‘That story is probably a familiar story to service members 
and may be kind of unremarkable to many of you. But to 

a civilian like myself, that story blows my mind, because 

we got that close to killing a completely innocent farmer. I 
know this stuff happens all the time. But hearing it in those 
details, especially from the perspective of the guy who had 
a gut feeling that he was right and he was correct about 
that, is really interesting. It draws that contrast between 
what the algorithm told us and what the humans with 
their instincts and perspective said. Again, this is a case of 
mistaken identity, but the algorithms are also likely to make 
errors concerning threats when they're dealing specifically 
with predictive elements, such as whether somebody who's 
holding something is holding a gun, or another weapon, 

or maybe just some farm tool, or something else. And, you 
know, this constitutes the full range of things that might 
be viewed as evidence that could justify killing someone 

on the basis of what ultimately turns out to be some kind 
of algorithmic error. Now this case had a relatively happy 
ending, but there are no doubt other cases in which that 
wasn't the result. We certainly know that from drone cases, 
and we have no reason to believe that errors won't happen 
commonly with respect to these algorithmic systems as well. 


Predictive Killing 


‘The story of the Man in the Purple Hat calls to mind a 
number of distinctive ethical questions. The one I want to 
focus on is the role that algorithms can play in justifying 
killing. I mean this in the ethical sense. I’m not talking about 
what passes the relevant codes or restrictions that might be 
placed upon you in your distinctive roles, or what you can 
get away with in the legal sense, but ethically, what can jus- 
tify using these algorithms to make decisions about killing? 


I'm going to highlight just a few basic issues here about 
whether these things can justify killing. The first question 
to ask is, well, what evidence do they offer us? What is 
their evidentiary value? There’s a basic question here about 
whether the various statistical inferences at work within 


these algorithms can count as evidence at all. I think the 
answer has to be yes, it counts somewhat. To the extent that 
it’s decisive is a separate issue, but it certainly does seem to 
count more than, for example, a Ouija board.’ I think that, 
for obvious reasons, it would be pretty shocking if we were 
just relying on the evidence provided by a Ouija board. 
‘The question, it seems to me, is not whether information 
from an AI device could, in principle, count as evidence, 
but whether the algorithms, such as they are, provide 
sufficient evidence in the context in which they are used, 
for the types of things they’re expected to justify. So we 
don’t just want to say, look, they provide evidence; they 
tell us information we didn’t already have. We know that. 
‘The question isn’t whether they tell us something we didn’t 
already know. The question is whether they tell us enough 
to be able to justify what it is we're using them for. I think 
that’s the important ethical question. 
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The question isn’t whether they tell us 
something we didn’t already know. The 
question is whether they tell us enough 
to be able to justify what it is we’re using 
them for. 


As in most algorithmic systems, the ones we're talking 
about are fundamentally opaque. That’s a kind of technical 
term, even though the words are not technical. These 
systems are opaque in the sense that even those who use 
them couldn't explain why the algorithms have come to 
the conclusions that they have. In this sense, they are like a 
Ouija board. You don’t know why you ended up on these 
letters; you just know that you did. Whereas with human 
deliberation, you can say, “Oh, explain that again to me; 

I didn’t quite get that,’ with a lot of these algorithms, 

you just get, “That’s the answer. That’s the guy.” It doesn’t 
explain to you why, or give you all the data points that 
yielded that conclusion. In fact, in many cases, it cannot do 
that, even on the best version of the algorithms. Moreover, 
the systems are prone to mistakes from bad input data. 

You get cases where it’s just terrible data that gets filtered 
through, and in many cases, you don’t know that the data is 
so bad until it’s too late. If Kevin's unit had killed the Man 
in the Purple Hat, we wouldn’t necessarily have known 
which data points gave us that conclusion. Incidentally, in 
her book, Jacobsen says they think that the system identi- 
fied the wrong man because of the way that the light hit his 
hat. They think that the man on the tractor was actually 
wearing a blue hat and the cameras weren't sophisticated 
enough to pick up that difference because of the differ- 
ences in lighting between rural Afghanistan and whichever 
locations were within the system’s data set, which is scary. 


So that’s just about identification issues. With predictive 
issues, we're trying to make predictions about what people 
will do, and it’s really hard to do that. We're very bad at 
predicting what people will do. Algorithms are also very 
bad at predicting what people will do. This is true with 
predictive policing systems as well. We can make highly 
informed guesses. But again, we're not just talking about 
whether we should keep a closer eye on a certain person. 
We're asking whether this prediction gives us something 
like a justification for killing them, or potentially targeting 
them, right? So there really is a separate issue here about 
whether we can say, look, we think he’s going to go into 
that area, and whether he will actually do that. That's an 
important issue when we look more broadly at the predic- 
tive versions of these technologies. 


There are reasons to be skeptical about the accuracy of the 
entire system. It’s all classified. It’s not subject to audit. 

By contrast, predictive policing systems can be audited. 
Predictive policing systems allow police departments to 
make predictions about things like which areas are going to 
have higher crime or which individuals are more likely to 
commit crimes, and the police can target their resources to 
those areas or those people on the basis of that algorithmic 
prediction. The algorithms in predictive policing systems 
can be subjected to an audit, in which an external super- 
visor comes in and explores the data. This happens all the 
time. In fact, there are companies whose entire mission is 
just to do these kinds of audits. In contrast, it seems highly 
unlikely that the US government is going to give an outside 
third party access to its treasure trove of data in order for 
that third party to say whether or not these systems are 
good. An audit could be done internally, but the public at 
least is going to be skeptical of that kind of audit and its 
ability to reveal what we want to know. 


Suppose the algorithm recommends killing someone who 
is, in fact, innocent, and a soldier carries out the killing. 
Unless there’s abundant evidence that emerges after the 
fact that this person was actually innocent, the algorithm 
and those who rely on it will inevitably see this and code it 
asa successful kill. If they had killed the Man in the Purple 
Hat, we would have had no reason to say that they made 

a mistake. Maybe later, we would get some information 
about that, but I’m not super confident that someone's 
going to say, “Oh, we found that out. Let’s go adjust the 
algorithm.” What seems more likely is that it’s not going 
to be something we'd want to admit to or embed in the 
algorithm. Ideally, we do, but even in those cases, we'd have 
to have that additional information to be able to do that. 


If the initial data and the secondary 
and ongoing data collection process 
are corrupted, they're just going to 
keep replicating those problems to an 
exponential degree. 


So it seems to me that, in a way that’s almost unique to this 
particular domain of algorithms, we're much Jess likely to 
have accurate data. That’s not just a problem: that error 
compounds, because the whole point of these systems is 
that they take data and make inferences on the basis of it 
over time in an iterative way. If the initial data and the sec- 
ondary and ongoing data collection process are corrupted, 
they're just going to keep replicating those problems to an 
exponential degree. This is a huge problem. If you have 

a software program that makes predictive texts on your 


phone, and you tell it, “No, that’s not what I wanted, it 
will say, “Okay, thanks,’ and it can make corrections. But if 
you never tell it, “No, that’s not what I wanted,’ it’s going 
to keep doing what it thought was good, and it’s going to 
think it’s getting it right. 


My concern is not just that these algorithms are bad, it’s 
that they’re going to continue to get worse and worse if we 
don’t have a system for stopping them—and I don’t think 
we have a system for stopping them. We know they’re 
imperfect. We don’t know how imperfect. We don’t have 
the ability to assess their imperfections. We do think, how- 
ever, that there is a positive non-zero evidentiary value. It's 
not just a Ouija board; it’s doing something that, at least 
broadly, is pointing us in the right direction. It’s also going 
to give us justifications—at least in the minds of some—to 
do things that we want to be very careful about. 


We're talking about what kind of justification we're inter- 
ested in. Philosophers sometimes like to draw a distinction 
between fact-relative justifications and evidence-relative 
justifications, and I think this distinction is going to be 

an important part of this discussion. So, fact-relative 
justification simply asks, was the target actually guilty? 
Was he actually doing the thing we thought he was going to 
do? This concerns a fact-relative justification. But you can 
imagine a case in which we get it wrong, and we say, “Hey, 
look, the best possible evidence we had told us that he was 
going to kill us.” We might think that this doesn’t make 
killing him right in a fact-relative sense, but we might speak 
of it as being evidence-relatively justified. In other words, 
maybe you shouldn’t be blamed for killing him. Maybe 
your decision should be investigated, but you shouldn't 

be held to account for it. You acted on the best possible 
evidence, so the outcome can be justified in light of the 
evidence, even if it’s not objectively right, or even if it’s not 
fact-relatively justified. Often, we're not going to know, 
and what we're really interested in is whether, when that 
algorithm says “kill” we can kill and be justified. I think 
this question is our quarry. 


Take the case of the Man in the Purple Hat. Would it have 
been evidence-relatively justified to kill the farmer who 
might be the Man in the Purple Hat? I’m not sure. I think 
probably not, for some complicated reasons relative to the 
story, as well: obviously, Kevin had doubts. That’s enough 
to question the evidence, but would somebody else in his 
place, who didn’t have those specific doubts, have been jus- 
tified? I’m not so sure. I’m curious to hear your thoughts. 


If there’s an evidence-relative justification 
we need to be concerned with, what 

is the threshold for having enough 
evidence? 


Another point here is, what is the right evidentiary 
threshold? If there’s an evidence-relative justification 

we need to be concerned with, what is the threshold for 
having enough evidence? Here's a view that I want to pitch 
to you, and you tell me whether you think it’s wrong or 
you think it’s right. This is what I would call the strong 
view. It’s the stubborn, recalcitrant, strong view that’s very 
not permissive: predictive algorithmic systems of the sort 
we're talking about generate insufficient evidence for an 
evidence-relative justification to kill. Why? Because the 
justificatory threshold, the evidentiary threshold for killing 
is super, super high. In order to justify killing someone, 

we need more than just a computer telling us that that’s 
the guy, and he’s going to kill somebody. If you've seen the 
movie Minority Report, you're familiar with the problem 
here.* So this is what I call the strong view: you can’t do 

it. It’s not evidence-relatively justified. It might be fact- 
relatively justified. It might turn out that the target actually 
was the bad guy. But our evidence didn’t give us enough 
information to be able to draw that conclusion. Our 
evidence was insufficient. That’s the point that I’m pushing 
on this class today, and I want you to push back against me 
and tell me ifand why you think that’s wrong. 


As I said, the evidence that’s generated by the technology is 
insufficient to meet that threshold. Here’s a set of thresh- 
olds that you might endorse. Now, these thresholds are 
about liability, but you might think of them as evidence 
thresholds as well. Michael Zimmerman is a philosopher; 
Adil Haque is a legal scholar; and the other—you might 
recognize the guy—is Barack Obama. Actually, the Obama 
administration's threshold is the most restrictive and I 
think it’s closest to what I would be inclined to endorse: 
we would need near-certainty that the target is present, and 
that there would be no noncombarant collateral deaths. 
Adil Haque, who is a legal scholar at Rutgers, thinks that 
with liability, we need evidence that it’s more likely than not 
that the target is liable. Michael Zimmerman says that it 
has to be likely enough that the person is liable. That seems 
to me vague, because we want to know how likely is likely 
enough. It kind of kicks the can down the road a bit. As for 
more likely than not—50 plus 1—that seems too low to 
me, frankly, Near-certainty is what I’m going for. I want a 


threshold that’s quite high. 
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The MQ-1B Predator UAV 
SrRawseER: Jeremy, just thinking through these epistemic 
standards, it’s a fas: 
setting up here. But if you're not sure how you feel about 
these standards, one thing I encourage people to do is to 
consider the basic philosophical reflection of reciprocity. 
Which one of these standards would you want an adver- 


ight! 


cinating thought experiment you're 


sary to apply to you? I would want it to be 


s: Ask what you could justify to someone else. If you 
were speaking to the family after you killed their brother, 
and said, “Look, it was a 50:50 shot and, like, 50-plus; we 
were right there,” they'd say, “No, you killed him.” Again, 
these decisions are based on algorithmic systems, and are 
typically being made by people who are far removed from 
the action. This also involves the question of necessity. In 
addition to the evidence of whether or not we can justify 
killing him, we need to answer other morally relevant ques- 


tions as well. One of these questions is whether it’s neces- 
sary to kill him now. Think about the Man in the Purple 
Hat. It was not necessary to kill him at that moment. They 
could have waited. In fact, they did wait, and it was fine, 


right? But what if he were a serious target who was about 


to do something horrible? Even so, there was no evidence 
that, right then, when he was sitting on the tractor taking 
a nap, he was going to do something awful. He didn’t 


have any kind of trigger or anything in his hand, right? So 


based on necessity or last resort, which is a condition that 
you're all familiar with in the law of armed conflict, or in 
the conditions of jus in bella—the conditions that apply to 
soldiers in war—it doesn’t seem like we passed that test, so 


that’s going to challenge justification as well. 


It’s not just a matter of whether you have enough evidence; 
it’s also a matter of whether all the relevant moral condi- 
ied. Just to give you a sense of what I’m after 
here, you might also think about whether there are other 
less harmful alternatives that are available. The algorithm 
says, “This is our guy.” But maybe we need to gather further 
evidence from alternative sources that aren’t the same 
algorithm. Unfortunately, we might have to expose some of 


tions are sat 


our service members to the risk of getting harmed in order 
to confirm the algorithm. Or we could apprehend the 
subject rather than killing him, right? Those are reversible 
approaches that involve satisfying our moral constraints 
and not potentially killing innocent people. 


‘There’s a question here that’s not just about the specific 
instance. We want to zoom out a little bit. We can look at 
the case of the Man in the Purple Hat or this other thought 
experiment, and we might ask whether it was justified. Is 
the practice of relying on these systems justified? Is the 
practice itself justified? Is it evidence-relatively justified 


to institute the practice of relying on these systems in a 


military context? Again, these are points I’ve already made. 


We have unreliable evidence. In fact, I think we have really 
good reasons to believe that the evidence is bad—better 
than human evidence in some cases, but potentially 
corrupted in ways that, again, risk that it’s just going to be 
replicated over time. Given that we know this, we might 
say that the practice of using these to justify killing is spe- 
cious. There’s a whole other talk we could have about this: 
the risk of over-reliance, the resort to using this method 
rather than all these alternatives .... People who are 
skeptical of the usage of drones will make the same point: 
they’ll say, look, in isolated incidents it is particularly valu- 
able, but they might worry that reliance on this system, on 
the whole, generates problems that wouldn't be there if we 
were more cautious about its use. 


Is it evidence-relatively justified to in- 
stitute the practice of relying on these 
systems in a military context? 


‘There's a concern here, too, that this might entrench our 
attitudes against putting people in harm’s way, which 

is good—I don’t want any of you to be harmed—but 
allowing for that to be further entrenched in our attitudes 
will keep us at a distance from the kinds of acts that we're 
committing, and will make us more likely to rely on 
automated systems for reasons that are meant to keep the 
public happy and that sort of thing. 


T'll pause here now to get a sense of what you think about 
this. I love to have disagreements. If you think I’m dead 
wrong, you can tell me I’m dead wrong and tell me the 
reasons why. I won't be offended. 


SPEAKER 1: We were talking earlier about the develop- 
ment of the algorithms for self-driving cars, and how right 
now, in that experimental phase, we need to just accept 
some risk to develop those algorithms. So, looking at the 
scenarios you've presented here, and knowing that the 
United States is not the only competitor developing this 
type of stuff, we do have the highest ethical standards that 
we have to apply to it—hence this entire discussion. But 
our adversaries don’t have those same standards, at least to 
my understanding. They're going to accept a lot more risk, 
so I think, looking at this situation holistically, we need to 
accept that same level of risk, so that whenever we enter 
into an actual shooting war with those competitors, we're 
not behind them. The systems that are being developed 
could potentially save far more lives than are being mistak- 
enly struck now based on those algorithms. So, not to be 


callous, but looking at the greater good, couldn't a few bad 
strikes resulting in civilian casualties now potentially save a 
brigade from destruction 20 years from now? 


Davis: I think this question relies on some empirical as- 
sumptions that I’m not sure about. Is it true that our using 
this software now is going to be important for us to beat 
our adversaries? Why not have the systems developed and 
run them alongside what we're doing, but not use them to 
try to justify killing? For example, in predictive policing, 
you can run these systems and say, “We think that guy's 
going to kill somebody. Oh, interesting! He didn’t kill any- 
body. Okay, the algorithm was wrong. Let's recalibrate it. 
We know the world turned out this way. He didn’t actually 
kill someone.” It’s like you're playing computer simulations 
of games before they happen. There’s a sports blog that 
actually does this: it runs Madden simulations before every 
game and predicts how things are going to go. They're not 
actually doing anything, right? It’s like you're running a 
simulation alongside the real world to try to get a sense of 
how things will actually play out in the real world. 


Why not think that we should be doing something like 


that? In other words, 


stead of employing the algorithms 
just yet, just testing them so that we can actually get to a 
better efficacy at a certain point. So we can say, look, these 
things are perfect. They never fail. We've run them for so 
long that the data are perfect. I’m skeptical that they would 
stay perfect, but the point is, if we have a greater sense of 
their ability, I'd be more inclined to use them. I’m also wor- 
ried that—and you all know this at the level of hearts and 
minds—when you kill innocent people, innocent guys on 
tractors, you're likely to find yourself in a situation where 
you're producing more adversaries and creating a Whack- 
a-Mole problem that’s worse than if you had apprehended 
the guy and then realized that he’s not the right guy. The 
consequentialist perspective says we know we're causing 
harm, but in the long run, I think it’s a worthwhile point.’ 


Srrawsenr: The problem with consequentialism is that 
it’s always relying on prediction. One would have to have a 
stronger epistemic prediction that it really will work out in 
the long run for your argument to work. You're saying that 
you have greater epistemic confidence in increased future 
adversarial conflict and loss of life due to the technology, 
than that there is a way for us to achieve our goal now 
without having knowingly failed the evidentiary standard 
for killing when we think it’s justified in the present. You'd 
have to have a lot of confidence—a tremendous amount 
of confidence—in that prediction, and that’s my issue 
with the consequentialist move here. Consequentialism 

is certainly a valid approach to take, but for it to be sound 
every time, we'd have to have a crystal ball. When we don’t 
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have the required epistemic confidence, I worry we are 
trading one near-certainty—present moment knowledge 
that an action is wrong—with less certain predictions that 
the scales will eventually balance. 


Davis: I also worry that the calculus that we'd do more 
good than harm over the long run is something we'd resort 
toa bit too quickly. But the point is a good one. 


SPEAKER 2: | was participating in the Warfare Innova- 
tion Workshop last week, and my team was actually 
debating justified killing. We need to use the kill chain 

as a framework to understand how we make decisions to 
kill the targets without algorithms. Basically, once you fix, 
find, and track your objective, it goes to the operations 
center and the command has to do many, many activities: 
they have to check rules of engagement, the law of armed. 
conflict, probably estimate collateral damage, while not 
even knowing if that target is valuable, or if there are more 
valuable targets that you are more interested in. So there 
are many things that are going on that machines as of 
now are not able to do. If we want the machines to do the 
killing, they have to actually do this process, these steps in 
the kill chain process. 


As of now, there are two types of artificial intelligence: we 
have narrow artificial intelligence, which includes things 
like computer vision, detecting objects, very little tasks, 

as opposed to artificial general intelligence, which is like 
machines performing the same things that humans do and. 
even better. Scholars agree that it would take hundreds of 
years to achieve that kind of performance because you need 
to model perceptions and emotions, and we're not there 
yet. So we concluded that if we want to leverage algorithms 
to make decisions, we needed to change the kill chain pro- 
cess. Everybody was saying, “Yeah, man in the middle, man 
in the middle.” But what if the machines are giving you all 
the information and you just have to say, “Yes, press the 
trigger,” but they are training on bad information, like the 
case of the Man in the Purple Hat? Somehow we need to 
guarantee that the man in the middle does not have to be 
involved in the decision whether to kill, but is guaranteed 
to be training the algorithm. He can feed new information 
into the algorithm so it will train to process information as 
we humans want it to. We also recommended changing our 
military planning process, so, in the same way that we have 
appendices for operational assessment—how to collect 
information—perhaps we need an appendix for how to 
incorporate this information to train algorithms, so we 
don’t make the wrong decisions. 


Everybody was saying, “Yeah, man in 
the middle, man in the middle.” But 
what if the machines are giving you all 
the information but are training on bad 
information? 


Davis: Does it strike you as feasible or plausible to make 
the people who are training algorithms the same people 

who are making the decisions about when specifically to 
use them? That seems like a lot on them. 


SPEAKER 2: Somehow the process has to involve the guy 
who trains the algorithm: the operators, the decision 
makers. If you don’t include these people in the process of 
training the algorithm, you're only relying on black boxes. 
There’s a whole field called machine-learning interpret- 
ability—it doesn’t only happen in the military—but if you 
use an algorithm to help you detect diseases in an x-ray, 
for example, you don’t want to rely only on a black box. 
With machine-learning interpretability, you can definitely 
see where the algorithm is looking at the image. You can 
avoid bias for race and gender. Machine learning is already 
working dependably in many little fields, so people start to 
trust the algorithms, because right now they’re only doing 
little tasks. 


We are talking about fully autonomous systems for 
defensive purposes. Imagine a hypersonic missile. You 
cannot rely on calling your boss: “Hey, we should...” 
There isn’t time. Perhaps you need your defense system to 
be fully autonomous, but for offensive operations involving 
this stealthy assassin role for the machines, then you will 
need the man in the middle. You also have fully autono- 
mous systems that can do things that help the force. For 
instance, if there’s a big fire and you need to rescue people, 
you probably don’t want to risk people's lives to do it; you 
want a fully autonomous system. Right now at the United 
Nations, they are trying to ban legal autonomous weapons 
systems. The narrative that they are selling is killer robots, 
not this stealthy assassin role for the machines. So basically, 
they’re going to end up banning all the fully autonomous 
weapon systems: the ones that really help in the military 
for good purposes, and also the killing machines. 


Davis: You've made a lot of great points there. Thank you 
for your perspective. 


SPEAKER 3: I would argue that the epistemic system is 
going to depend on the theater of war. Are we ina limited 
conflict? Are we in an unlimited conflict? Are we waging 
total war? It’s also going to reflect the rules of engagement. 


To tap into an earlier speaker's point and your response 

to it, I think that, at some point, if we ran parallel tests, 
and we started getting information from an autonomous 
system that would have improved—maybe not perfected, 
but improved—our decision making, I think we would 
have a moral imperative to use it. Otherwise, if we're 
saying, “Well, yeah, it’s only at 98 percent; it’s not perfect 
yet, so we're not going to let the ground force commander 
or the kill chain have access to this information,” then 
we'll just continue to make mistakes. And we already make 
mistakes, Everybody who conducts killing makes mistakes 
at some point. 


Srrawser: The comparison’s not in a vacuum. It’s against 
humans. 


Davis: That’s a great point, and it comes out with a lot of 
these technologies. With predictive policing, for example, 
it’s bad. We mess up a lot. Also, it’s really good. It’s more 
effective in many cases than just relying on human officers. 
These systems can digest 80 years of data simultaneously, 
which is amazing, and they are surely going to do better 
than humans in a lot of cases. They're going to expose 
officers or soldiers to less risk. It’s going to result in fewer 
hours of wasted manpower. That's phenomenally valu- 
able. I think you're right that there’s a real challenge here. 
Shouldn’t that mean that we have really strong reasons to 
use the technology unless we have good evidence that it’s 
bad, that it’s getting it wrong in that particular instance? 
It’s a good point. It’s a good challenge on the other side to 
say these are really good, really effective systems, and—it 
would depend on the specifics, as you say—but in many 
cases, it seems like to zor use that data would be like not 
outfitting our soldiers with shields that we know are really 
effective in preventing harm. That's a valuable point. 


Speaker 4: Do you think it reduces risk or just transfers 
risk to somebody else? In this program, we've learned 

that all models are wrong, but are sometimes useful. So 
approaching it like that, there’s the target engagement 
authority, or the UAV operator out in Nevada with a team 
sitting next to him, and the team says, “Well, the model 
said this” or “The algorithm said this,’ and “We were 
wrong” or “We were justified.” Yes or no doesn’t really 
matter. The question is, where did that risk go? You had 
said that benevolent leaders are looking at this and making 
improvements, but one other term to add to the rules of 
engagement might be “imminence.” Is the decision based 
on imminent necessity? Yeah, ifa bad guy is a bad guy, we 
want to find him and eliminate him, but to what point? 
And if all models are wrong and we authorize lethal force, 
where does that risk end up? And that goes into the rules 
of engagement of whatever theater we're in and what is an 


acceptable level of risk, because there will be times when 
we're willing to accept 51 percent. 


My concern is that, as a practice—not 
in a specific isolated instance, but as a 
practice—we are distributing massive 
amounts of risk onto innocent people. 


Davis: It’s going to depend on what we're trying to kill 
him for. If we think he’s about to go into a stadium and 
blow up the entire stadium full of people, our evidence 
threshold looks a little bit different. Not only do I agree, 
but I endorse that point. In terms of distributing risk, my 
concern is that, as a practice—not in a specific isolated 
instance, but as a practice—we are distributing massive 
amounts of risk onto innocent people. I think that practice 
needs to be justified, because we're killing lots of targets 
that we should be killing, but we're likely also exposing 
others to risk, and not just risk of death or harm, but 
other kinds like broader psychological or existential risks. 
There are concerns in some cases that drone warfare has 
prevented groups from gathering in places because they're 
afraid that they're going to look suspicious, and this is hin- 
dering democracy. So there are very small-level things thar, 
taken in the aggregate, could be massively problematic. So 
in answer to your question, my hunch is that it distributes 
risk in a way that stands in need of a justification. I’m not 
saying it’s unjustified; it depends on the particulars, but I 
have questions about that. 


Speaker 5: Building outa point you brought up earlier, 
you've got to distinguish where the algorithm is func- 
tioning. Right now, the sensor is agnostic. The sensors are 
pulling in data. The human is looking at the data, or the 
algorithm is sifting through the data, or usually both. That's 
what we're doing right now. In my opinion, the weight 
behind your argument has to show that the algorithm is 
less accurate than the human. Having been there and done 
this firsthand, I can tell you that humans get impatient. We 
do alot of strikes because we haven’t done a strike lately, 
and it’s close enough. It’s good enough. We're pretty sure 
that they’re all bad anyway. That’s what humans are doing 
right now. 


Srrawser: That’sa pretty low bar, and it could be that 
you think that’s wrong, and you could think that they're 
both wrong. 


Speaker 5: True, but which is less wrong? In your argu- 
ment, youre using the term “bad.” I don’t know that “bad” 


is an appropriate term to use. Is it better or worse than the 
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humans who are currently making mistakes? Now we've 
got some leeway. In some theaters, there’s a lot of leeway 
for making mistakes. Nobody ever finds out. Nobody ever 
knows. Nobody really cares. But is the algorithm better or 
worse than the humans? Because the humans are making 
mistakes. The sensors that are collecting the data are the 
same sensors for either side. 


Davis: Suppose we had an algorithm that replaced a jury. 
It’s a jury algorithm. We don’t need people anymore, just 
ajury algorithm, and based on all the things that are said 
in the rules and whatever, it determines guilt or innocence. 
We say, look, it only sends innocent people to jail 10 
percent of the time. Let's say that human juries do it maybe 
15 to 20 percent of the time. I’m just making these numbers 
up, but let’s hypothesize. Would we say, hey, that algorithm 
is good, and we should start using it? My temptation is to 
say that both of those are way too high. I’m skeptical about 
using the 10 percent, and saying, look, the algorithm told 
us to do it. We're good! Clean hands! No need to worry 
about it, and in fact, we might be using it more often, 
taking trials to the jury algorithm more often because 

we're confident in this system, My concern is that its being 
slightly better in some instances, or even on average, is not 
enough. It might be that that’s better than the alternative, 
but I’m not sure that that means we should rely on it, or we 
should use it. I see the argument in the other direction, too. 


Srrawser: There are probably stories about the Man in 
the Green Hat, where the algorithm got it right and the 
human operator got it wrong, and the human operator 
overruled the algorithm, and they’re like, oh, damn, we 
should have listened. It’s sort of a math question. Actu- 
ally, there are really two questions here: what is ethically 
the right evidentiary standard that needs to be justified, 
depending on the stakes, depending on all the other con- 
texts, in order to say that we're ethically justified to do this 
based on this confidence level? That's the main question, 
and then you're asking, secondarily, could this technology 
ever meet that threshold? Now, maybe your point is that 
perhaps the algorithm can or cannot, but humans aren’t 
achieving it in the first place, which is just depressing. But 
that doesn’t answer the question, right? Because they both 
could fail. 


Davis: You might say it’s the lesser of two evils, or it’s the 
better of bad alternatives, but I don’t think that means we 
should use it. It might just mean slightly better. 
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